{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivot review data with Elastic data frames\n",
    "\n",
    "This notebook shows how data can be pivoted with [Elastic data frames](https://www.elastic.co/guide/en/elastic-stack-overview/master/ml-dataframes.html) to reveal insights into the behaviour of reviewers. The use case and data is from Mark Harwood's talk on [entity-centric indexing](https://www.elastic.co/videos/entity-centric-indexing-mark-harwood).\n",
    "\n",
    "An alternative version of this notebook uses python [pandas](https://pandas.pydata.org/) to create the same results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import bz2\n",
    "import matplotlib.pyplot as plt\n",
    "import csv\n",
    "import time\n",
    "import pandas as pd\n",
    "\n",
    "from elasticsearch import helpers\n",
    "from elasticsearch import Elasticsearch\n",
    "from elasticsearch.exceptions import NotFoundError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Elasticsearch\n",
    "\n",
    "First connect to Elasticsearch. This assumes access is via `localhost:9200`, change next line to change connection parameters (see https://elasticsearch-py.readthedocs.io/en/master/api.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "es = Elasticsearch()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read data to Elasticsearch\n",
    "\n",
    "Note this deletes and creates indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'acknowledged': True, 'shards_acknowledged': True, 'index': 'anonreviews'}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index_name = \"anonreviews\"\n",
    "index_name_pivot = 'anonreviews_pivot'\n",
    "index_settings = { \"settings\": { \"number_of_shards\": 1, \"number_of_replicas\": 0 }, \"mappings\": { \"properties\": { \"reviewerId\": { \"type\": \"keyword\" }, \"vendorId\": { \"type\": \"keyword\" }, \"date\": { \"type\": \"date\", \"format\" : \"yyyy-MM-dd HH:mm\" }, \"rating\": { \"type\": \"integer\" } } } }\n",
    "es.indices.delete(index=index_name, ignore=[400, 404])\n",
    "es.indices.create(index=index_name, body=index_settings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "actions = []\n",
    "bulk_batch_size = 10000\n",
    "n = 0\n",
    "\n",
    "csv_handle = bz2.open('./anonreviews.csv.bz2', 'rt')\n",
    "csv_reader = csv.DictReader(csv_handle)\n",
    "for row in csv_reader:\n",
    "    n += 1\n",
    "    action = { \"_index\": index_name, \"_source\": row }\n",
    "    actions.append(action)\n",
    "    \n",
    "    if n % bulk_batch_size == 0:\n",
    "        helpers.bulk(es, actions)\n",
    "        actions = []\n",
    "        \n",
    "if len(actions) > 0:\n",
    "    helpers.bulk(es, actions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Flush to ensure all docs are indexed, then summarise indexing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'count': 578805,\n",
       " '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.indices.flush(index=index_name)\n",
    "es.count(index=index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregate and Pivot data\n",
    "\n",
    "Pivot data so we get summaries for each reviewer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In pandas, we do the following:\n",
    "\n",
    "```\n",
    "aggregations = {\n",
    "    'rating':'mean',\n",
    "    'vendorId':'nunique',\n",
    "    'reviewerId':'count'\n",
    "}\n",
    "\n",
    "grouped = reviews.groupby('reviewerId').agg(aggregations)\n",
    "grouped.columns=['avg_rating', 'dc_vendorId', 'count']\n",
    "```\n",
    "\n",
    "In Elasticsearch we can use data frames to achieve the same transformation. First preview the transformation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "request = {\n",
    "  \"source\": {\n",
    "    \"index\": index_name\n",
    "  },\n",
    "  \"dest\": {\n",
    "    \"index\": index_name_pivot\n",
    "  },\n",
    "  \"pivot\": {\n",
    "    \"group_by\": {\n",
    "      \"reviewerId\": {\n",
    "        \"terms\": {\n",
    "          \"field\": \"reviewerId\"\n",
    "        }\n",
    "      }\n",
    "    },\n",
    "    \"aggregations\": {\n",
    "      \"avg_rating\": {\n",
    "        \"avg\": {\n",
    "          \"field\": \"rating\"\n",
    "        }\n",
    "      },\n",
    "      \"dc_vendorId\": {\n",
    "        \"cardinality\": {\n",
    "          \"field\": \"vendorId\"\n",
    "        }\n",
    "      },\n",
    "      \"count\": {\n",
    "        \"value_count\": {\n",
    "          \"field\": \"_id\"\n",
    "        }\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "}\n",
    "\n",
    "response = es.transport.perform_request('POST', '/_data_frame/transforms/_preview', body=request)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'reviewerId': '0', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '1', 'avg_rating': 5.0, 'count': 9.0, 'dc_vendorId': 7.0},\n",
       " {'reviewerId': '10', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '100', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '1000', 'avg_rating': 4.5, 'count': 4.0, 'dc_vendorId': 4.0},\n",
       " {'reviewerId': '10000', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '100000', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '100001', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '100002', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0},\n",
       " {'reviewerId': '100003', 'avg_rating': 5.0, 'count': 1.0, 'dc_vendorId': 1.0}]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response['preview'][0:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Elastic Data Frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'acknowledged': True}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# First delete old index if it exists\n",
    "es.indices.delete(index='anonreviews_pivot', ignore=[400, 404])\n",
    "\n",
    "# Stop and delete any old jobs (ignore if they don't exist)\n",
    "try:\n",
    "    es.transport.perform_request('POST', '/_data_frame/transforms/anonreviews_pivot/_stop')\n",
    "    es.transport.perform_request('DELETE', '/_data_frame/transforms/anonreviews_pivot')\n",
    "except NotFoundError:\n",
    "    pass\n",
    "    \n",
    "# Now create data frame job (called anonreviews_pivot)\n",
    "es.transport.perform_request('PUT', '/_data_frame/transforms/anonreviews_pivot', body=request)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'acknowledged': True}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Start job\n",
    "es.transport.perform_request('POST', '/_data_frame/transforms/anonreviews_pivot/_start')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'total_docs': 578805, 'docs_remaining': 509542, 'percent_complete': 11.966551774777344}\n",
      "{'total_docs': 578805, 'docs_remaining': 452708, 'percent_complete': 21.78574822263111}\n",
      "{'total_docs': 578805, 'docs_remaining': 399715, 'percent_complete': 30.941336028541564}\n",
      "{'total_docs': 578805, 'docs_remaining': 351594, 'percent_complete': 39.255189571617386}\n",
      "{'total_docs': 578805, 'docs_remaining': 302940, 'percent_complete': 47.66112939591054}\n",
      "{'total_docs': 578805, 'docs_remaining': 255684, 'percent_complete': 55.82553709798637}\n",
      "{'total_docs': 578805, 'docs_remaining': 211096, 'percent_complete': 63.52899508470038}\n",
      "{'total_docs': 578805, 'docs_remaining': 172708, 'percent_complete': 70.16128056944912}\n",
      "{'total_docs': 578805, 'docs_remaining': 136928, 'percent_complete': 76.34298252433894}\n",
      "{'total_docs': 578805, 'docs_remaining': 86007, 'percent_complete': 85.14059139088295}\n",
      "{'total_docs': 578805, 'docs_remaining': 16588, 'percent_complete': 97.13409524796779}\n",
      "{'total_docs': 578805, 'docs_remaining': 0, 'percent_complete': 100.0}\n"
     ]
    }
   ],
   "source": [
    "# Poll for progress\n",
    "while True:\n",
    "    response = es.transport.perform_request('GET', '/_data_frame/transforms/anonreviews_pivot/_stats')\n",
    "    \n",
    "    if response['transforms'][0]['state']['task_state'] == 'stopped':\n",
    "        print(response['transforms'][0]['state']['progress'])\n",
    "        break\n",
    "    if 'progress' in response['transforms'][0]['state']:\n",
    "        print(response['transforms'][0]['state']['progress'])\n",
    "    time.sleep(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def hits_to_df(response, create_index=True):\n",
    "    hits = []\n",
    "    index = []\n",
    "    for hit in response['hits']['hits']:\n",
    "        hits.append(hit['_source'])\n",
    "        index.append(hit['_source']['reviewerId'])\n",
    "    if create_index:\n",
    "        return pd.DataFrame(hits, index=index)\n",
    "    else:\n",
    "        return pd.DataFrame(hits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find 'haters'\n",
    "\n",
    "Reviewers that give more than five zero star reviews to one vendor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "q =  \"dc_vendorId:1 AND count :>5 AND avg_rating:0\"\n",
    "sort = \"count:desc\"\n",
    "\n",
    "response = es.search(index='anonreviews_pivot', q=q, sort=sort, size=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>avg_rating</th>\n",
       "      <th>count</th>\n",
       "      <th>dc_vendorId</th>\n",
       "      <th>reviewerId</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10392</th>\n",
       "      <td>0.0</td>\n",
       "      <td>94.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>10392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17033</th>\n",
       "      <td>0.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>17033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21046</th>\n",
       "      <td>0.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>21046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11479</th>\n",
       "      <td>0.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>11479</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27448</th>\n",
       "      <td>0.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>27448</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17602</th>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>17602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8185</th>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>8185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13984</th>\n",
       "      <td>0.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>13984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228129</th>\n",
       "      <td>0.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>228129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25267</th>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>25267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53432</th>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>53432</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19813</th>\n",
       "      <td>0.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>19813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135506</th>\n",
       "      <td>0.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>135506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11987</th>\n",
       "      <td>0.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>11987</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        avg_rating  count  dc_vendorId reviewerId\n",
       "10392          0.0   94.0          1.0      10392\n",
       "17033          0.0   51.0          1.0      17033\n",
       "21046          0.0   25.0          1.0      21046\n",
       "11479          0.0   20.0          1.0      11479\n",
       "27448          0.0   19.0          1.0      27448\n",
       "17602          0.0   15.0          1.0      17602\n",
       "8185           0.0   15.0          1.0       8185\n",
       "13984          0.0   10.0          1.0      13984\n",
       "228129         0.0    9.0          1.0     228129\n",
       "25267          0.0    8.0          1.0      25267\n",
       "53432          0.0    8.0          1.0      53432\n",
       "19813          0.0    7.0          1.0      19813\n",
       "135506         0.0    6.0          1.0     135506\n",
       "11987          0.0    6.0          1.0      11987"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hits_to_df(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, reviewer 10392 gives 94 zero star reviews to vendor 122"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "q =  \"reviewerId:10392\"\n",
    "\n",
    "response = es.search(index='anonreviews', q=q, size=5) # top 5 only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>reviewerId</th>\n",
       "      <th>vendorId</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2006-06-11 09:14</td>\n",
       "      <td>0</td>\n",
       "      <td>10392</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2006-06-13 14:19</td>\n",
       "      <td>0</td>\n",
       "      <td>10392</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2006-06-15 21:03</td>\n",
       "      <td>0</td>\n",
       "      <td>10392</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2006-06-17 09:22</td>\n",
       "      <td>0</td>\n",
       "      <td>10392</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2006-05-17 06:52</td>\n",
       "      <td>0</td>\n",
       "      <td>10392</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               date rating reviewerId vendorId\n",
       "0  2006-06-11 09:14      0      10392      122\n",
       "1  2006-06-13 14:19      0      10392      122\n",
       "2  2006-06-15 21:03      0      10392      122\n",
       "3  2006-06-17 09:22      0      10392      122\n",
       "4  2006-05-17 06:52      0      10392      122"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hits_to_df(response, False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find 'fanboys'\n",
    "\n",
    "Reviewers that give more than five five star reviews to one vendor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "q =  \"dc_vendorId:1 AND count :>5 AND avg_rating:5\"\n",
    "sort = \"count:desc\"\n",
    "\n",
    "response = es.search(index='anonreviews_pivot', q=q, sort=sort, size=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>avg_rating</th>\n",
       "      <th>count</th>\n",
       "      <th>dc_vendorId</th>\n",
       "      <th>reviewerId</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>183751</th>\n",
       "      <td>5.0</td>\n",
       "      <td>73.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>183751</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260225</th>\n",
       "      <td>5.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>260225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>205864</th>\n",
       "      <td>5.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>205864</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345080</th>\n",
       "      <td>5.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345080</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179944</th>\n",
       "      <td>5.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>179944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345082</th>\n",
       "      <td>5.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345068</th>\n",
       "      <td>5.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345081</th>\n",
       "      <td>5.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345081</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345069</th>\n",
       "      <td>5.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345070</th>\n",
       "      <td>5.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345070</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345083</th>\n",
       "      <td>5.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345086</th>\n",
       "      <td>5.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345086</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345084</th>\n",
       "      <td>5.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345085</th>\n",
       "      <td>5.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>345085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>264635</th>\n",
       "      <td>5.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>264635</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>321206</th>\n",
       "      <td>5.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>321206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12539</th>\n",
       "      <td>5.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>12539</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>159035</th>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>159035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114661</th>\n",
       "      <td>5.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>114661</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39655</th>\n",
       "      <td>5.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>39655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22515</th>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>22515</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180082</th>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>180082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58447</th>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>58447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160018</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>160018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168143</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>168143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180085</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>180085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28814</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>28814</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30474</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>30474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35048</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>35048</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>393237</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>393237</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75010</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>75010</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        avg_rating  count  dc_vendorId reviewerId\n",
       "183751         5.0   73.0          1.0     183751\n",
       "260225         5.0   69.0          1.0     260225\n",
       "205864         5.0   35.0          1.0     205864\n",
       "345080         5.0   23.0          1.0     345080\n",
       "179944         5.0   22.0          1.0     179944\n",
       "345082         5.0   21.0          1.0     345082\n",
       "345068         5.0   20.0          1.0     345068\n",
       "345081         5.0   20.0          1.0     345081\n",
       "345069         5.0   19.0          1.0     345069\n",
       "345070         5.0   18.0          1.0     345070\n",
       "345083         5.0   18.0          1.0     345083\n",
       "345086         5.0   18.0          1.0     345086\n",
       "345084         5.0   17.0          1.0     345084\n",
       "345085         5.0   17.0          1.0     345085\n",
       "264635         5.0   13.0          1.0     264635\n",
       "321206         5.0   12.0          1.0     321206\n",
       "12539          5.0   11.0          1.0      12539\n",
       "159035         5.0   10.0          1.0     159035\n",
       "114661         5.0    9.0          1.0     114661\n",
       "39655          5.0    8.0          1.0      39655\n",
       "22515          5.0    7.0          1.0      22515\n",
       "180082         5.0    7.0          1.0     180082\n",
       "58447          5.0    7.0          1.0      58447\n",
       "160018         5.0    6.0          1.0     160018\n",
       "168143         5.0    6.0          1.0     168143\n",
       "180085         5.0    6.0          1.0     180085\n",
       "28814          5.0    6.0          1.0      28814\n",
       "30474          5.0    6.0          1.0      30474\n",
       "35048          5.0    6.0          1.0      35048\n",
       "393237         5.0    6.0          1.0     393237\n",
       "75010          5.0    6.0          1.0      75010"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hits_to_df(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reviewer 183751 gives 73 five star reviews to vendor 190"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "q =  \"reviewerId:183751\"\n",
    "\n",
    "response = es.search(index='anonreviews', q=q, size=5) # top 5 only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>reviewerId</th>\n",
       "      <th>vendorId</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2006-09-22 16:36</td>\n",
       "      <td>5</td>\n",
       "      <td>183751</td>\n",
       "      <td>190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2006-09-22 16:36</td>\n",
       "      <td>5</td>\n",
       "      <td>183751</td>\n",
       "      <td>190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2006-09-22 16:35</td>\n",
       "      <td>5</td>\n",
       "      <td>183751</td>\n",
       "      <td>190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2006-09-22 15:53</td>\n",
       "      <td>5</td>\n",
       "      <td>183751</td>\n",
       "      <td>190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2006-09-22 15:53</td>\n",
       "      <td>5</td>\n",
       "      <td>183751</td>\n",
       "      <td>190</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               date rating reviewerId vendorId\n",
       "0  2006-09-22 16:36      5     183751      190\n",
       "1  2006-09-22 16:36      5     183751      190\n",
       "2  2006-09-22 16:35      5     183751      190\n",
       "3  2006-09-22 15:53      5     183751      190\n",
       "4  2006-09-22 15:53      5     183751      190"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hits_to_df(response, False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
